Eligibility Traces
Published:
From ME539, Oregon State
The last slide from these lecture notes:
Gradient descent Sarsa((λ)):
$\Delta \vec \theta_t = \alpha \delta \vec e_t$
where
$\delta_t = r_{t+1} + \gamma Q_t(s_{t+1},a_{t+1}) − Q_t(s_t,a_t)$
$\vec e = \gamma \lambda \vec e_{t-1} +\nabla_{\vec \theta} Q_t(s_t,a_t)$
I understand and have implemented most of that, in one place or another. Just the last part of that last line I need to figure out.
TF implementation
As usual with Tensorflow implementations, it’s very hard to follow.
Do you only need to update traces with the derivitive of the output with respect to the weights of only the last (output) layer, or for every layer?
Biologically plausible learning in RNNs
$e_{i,j}(t) = e_{i,j}(t-1) + S(r_j(t-1)(x_i(t) - \bar x_i))$
“where $r_j$ represents the output of neuron j, and thus the current input at this synapse. $x_i$ represents the current excitation (or potential) of neuron i and $\bar x_i$ represents a short-term running average of $x_i$, and thus $x(t) - \bar x$ tracks the fast fluctuations of neuron output.”
S must be a monotonic, supralinear function. In this paper they used the cubic function $S(x) = x^3$.
This one seems to relatively easy to implement, although it does seem like it would need a decay parameter on the previous values.